Conversion between Scripts of Punjabi: Beyond Simple Transliteration

نویسندگان

  • Gurpreet Singh Lehal
  • Tejinder Singh Saini
چکیده

This paper describes statistical techniques used for modelling transliteration systems between the scripts of Punjabi language. Punjabi is one of the unique languages, which are written in more than one script. In India, Punjabi is written in Gurmukhi script, while in Pakistan it is written in Shahmukhi (Perso-Arabic) script. Shahmukhi script has its origin in the ancient Phoenician script whereas Gurmukhi script has its origin in the ancient Brahmi script. Whilst in speech Punjabi spoken in the Eastern and the Western parts is mutually comprehensible, in the written form it is not so. This has created a script wedge as majority of Punjabi speaking people in Pakistan cannot read Gurmukhi script, and similarly the majority of Punjabi speaking people in India cannot comprehend Shahmukhi script. In this paper, we present an advanced and highly accurate transliteration system between Gurmukhi and Shahmukhi scripts of Punjabi language which addresses various challenges such as multiple/zero character mappings, missing vowels, word segmentation, variations in pronunciations and orthography and transliteration of proper nouns etc. by generating efficient algorithms along with special rules and using various lexical resources such as Gurmukhi spell checker, corpora of both scripts, Gurmukhi-Shahmukhi transliteration dictionaries, statistical language models etc. The proposed system attains more than 98.6% accuracy at word level while transliterating Gurmukhi text to Shahmukhi. The reverse part i.e. transliterating from Shahmukhi text to Gurmukhi is more complex and challenging but our system has achieved 97% accuracy at word level in this part too.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Punjabi Machine Transliteration

Machine Transliteration is to transcribe a word written in a script with approximate phonetic equivalence in another language. It is useful for machine translation, cross-lingual information retrieval, multilingual text and speech processing. Punjabi Machine Transliteration (PMT) is a special case of machine transliteration and is a process of converting a word from Shahmukhi (based on Arabic s...

متن کامل

Shahmukhi to Gurmukhi Transliteration System

The existence of two scripts for Punjabi language has created a script barrier between the Punjabi literature written in India and Pakistan. This research has developed a new system for the first time of its kind for Shahmukhi text without diacritical marks. The purposed system for Shahmukhi to Gurmukhi transliteration has been implemented with various research techniques based on language corp...

متن کامل

Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach

This research paper describes a corpus based transliteration system for Punjabi language. The existence of two scripts for Punjabi language has created a script barrier between the Punjabi literature written in India and in Pakistan. This research project has developed a new system for the first time of its kind for Shahmukhi script of Punjabi language. The proposed system for Shahmukhi to Gurm...

متن کامل

Statistical Approach to Transliteration from English to Punjabi

-Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Transliteration is a crucial factor in CLIR and MT. It is important for Machine Translation, especially when the languages do not use the same scripts. This paper addresses the issue of statistical mach...

متن کامل

Sangam: A Perso-Arabic to Indic Script Machine Transliteration Model

Indian sub-continent is one of those unique parts of the world where single languages are written in different scripts. This is the case for example with Punjabi, written in Indian East Punjab in Gurmukhi script (a Left to Right script based on Devnagri) and in Pakistani West Punjab, it is written in Shahmukhi (a Right to Left script based on Perso-Arabic). This is also the case with other lang...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012